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ABSTRACT 


The demand for an efficient intrusion detection system has grown as attackers continue to create new attacks 
and network sizes expand. Recently, many techniques have been released for network intrusion detection 
systems (NIDSs). However, new threats are constantly developing and outside existing systems reach. The 
intrusion detection algorithms high error rate, significant dimensionality, false alarm rate, redundancy, 
meaningless data, and false negative rate now in use are only a few of the many issues with them. Given its 
exceptional performance in various detection and recognition tasks, we present a novel and efficient deep 
learning-based NIDS in this research. Initially in preprocessing data encoding and normalization are 
performed using raw input data. After preprocessing, the pre-processed data are fed into the feature extraction 
phase. The features are extracted by utilizing the SE-ResNeXt-101 approach. Then, the essential features are 
selected with the help of an Improved Binary Dandelion Algorithm (IBDA). The presented novel Improved 
Residual Dense Network (IRDN) is employed to identify attacks which enhance security and privacy inside 
the network framework. The lyrebird optimization technique is used to tune further the hyperparameters 
derived from the IRDN approach to increase performance. The Modified Generator GAN (MG-GAN) 
algorithm also solves the data imbalance issue. The research shows that the suggested technique outperforms 
current NIDS methods regarding assessment metrics. Additionally, this method is more suitable for 
complicated detection of network intrusion requirements. 


Keywords: Network Intrusion Detection Systems (Nidss), SE-Resnext-101, Improved Binary Dandelion 
Algorithm (IBDA), Modified Generator GAN (MG-GAN), Improved Residual Dense Network 


(IRDN). 
Symbols Abbreviation 
Network Intrusion Detection Systems NIDS 
Improved Binary Dandelion Algorithm IBDA 
Improved Residual Dense Network IDRN 
Modified Generator GAN MG-GAN 
Hierarchical Adversarial Attack HAA 
Graph Neural Network GNN 
Adaptive Synthetic ADASYN 
Long Short-Term Memory LSTM 
one-side selection OSS 
Deep Neural Network DNN 
n the total number of specimens 
x the number of samples 
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i 
Kullback—Leibler divergence 
Jensen—Shannon divergence 
P, and P, 
squeeze-and-excitation 
UR 
Core dandelions 
rande 
Femax 
favg 
SN 
Scope (t); 
r 
Global Average Pooling 
Lyrebird optimization algorithm 
Xi 
lb and ub 
F;, 


Loop counter 
KL 
JSD 
likelihood distributions 
SE 
Parameter value for R filter 
CD 
wilting and growth factors 
maximum assessment time 
overall mean value of fitness of dandelions 
the total of all dandelions and seeds 
the radius of seeding in the i dimension 
random integer 
GAP 
LOA 
possible outcome 
lower and upper bounds 


function objective 


1. INTRODUCTION 


The importance of network security has 
increased due to the swift development of big 
data, cloud computing, and other related 
innovations and data, along with the growing 
dependence of our everyday communications on 
networked services [1]. These factors have 
rendered networks indispensable. Every 
weakness or danger will impact the network as a 
whole. Traditional security measures like 
firewalls and encryption approaches struggle in an 
environment where attackers always create more 
sophisticated attacks [2, 3]. Additionally, to 
provide safe networks, cybersecurity researchers 
discovered how crucial it is to develop effective 
network intrusion detection systems (IDSs) [4]. 
Availability, confidentiality, and integrity are the 
goals of IDSs, which work to stop illegal access 
to networks, safeguard the data and 
communication systems within them, and above 
all identify suspected and unidentified risks and 
attacks with a low false alarm rate and high 
precision [5—7]. 


The two methods by which IDS operate are 
signature and anomaly-based methods [8]. 


Anomaly-based detection recognizes the assault 
based on peculiar user behaviour designs. In 
contrast, signature-based detection employs a 
known set of criteria or indications from the 
equipment's assault databases to determine if the 
behaviour is malicious or not [9, 10]. It may be 
possible to identify an attack if users engage in 
strange behaviour. 


Behaviour is used in conjunction with 
various ML and data mining methods to provide 
the highest performance for identifying attack 
activities in anomaly-based detection systems 
(IDSs) [11-13]. These techniques shrewdly 
recognize and offer a new viewpoint on the 
various attacks plaguing worldwide computer 
networks [14, 15]. However, there are still several 
issues with applying the machine learning 
approach in IDSs. Developing a suitable model to 
describe datasets is the most significant obstacle 
facing machine learning techniques [16-18]. 
Numerous methods and learning strategies have 
been used to build models for efficient systems 
detecting intrusions. On the other hand, current 
models show poor accuracy, low false alarm rates, 
and low detection rates [19, 20]. 


nner ee 


Journal of Theoretical and Applied Information Technology 
30" April 2024. Vol.102. No 8 


© Little Lion Scientific 


ISSN: 1992-8645 


SATIT 


E-ISSN: 1817-3195 


The selection of the NIDS problem resulted from 
a thorough analysis of the Cybersecurity 
environment, which is defined by the rise in 
complexity and variety of cyber threats directed at 
network infrastructures. We have concentrated on 
NIDS because they protect network resources and 
identify malicious activity, unauthorized access, 
and unusual network traffic. It is imperative to 
recognize, though, that the efficacy of NIDS can 
differ based on the particular kinds of intrusions 
targeted and the features of the network 
environment. While NIDS can detect a wide range 
of intrusions, including network-based attacks 
like DDoS, network scanning, and DoS, their 
effectiveness in identifying particular intrusions 
may be impacted by traffic volume, network 
topology, and attack complexity. Thus, even 
though NIDS are essential to network defence 
strategies, their effectiveness and applicability to 
various intrusions require careful thought and 
customization to match different network 
environments unique needs and difficulties. 

To develop an intrusion detection system 
that works, many models have been and are still 
being developed based on different approaches 
and learning techniques. Current models exhibit 
low detection, high false alarm rates, and poor 
precision. Deep-learning-model-based NIDSs 
have been shown to outperform machine learning 
models in accuracy. However, because of class 
imbalances in the benchmark datasets, they 
cannot identify attacks with lower traffic. 
Contemporary benchmark datasets for intrusion 
detection feature class imbalances, wherein the 
amount of normal traffic significantly exceeds 
that of attacks, thereby simulating real-world 
network traffic. Specific attacks appear far more 
frequently than others, even among the various 
attack types. As a result, the NIDS performs 
worse overall and has trouble identifying some 
kinds of attacks. The unbalanced data has 
received insufficient attention in recent NIDS 
research despite the fact that it negatively impacts 
the NIDS's ability to detect attacks accurately. 
This paper attempts to address these concerns and 
develop a more effective detection model. The 
novel improved residual dense network is 
employed to identify assaults, which enhances 
detection accuracy. The essential features are 
extracted and selected based on SE-ResNeXt-101 
and the improved binary dandelion algorithm 
(IBDA). Our findings showed that deep learning 
methods will increase the accuracy of the model 
and resistance to threats and attacks by increasing 
its rate of detection and efficiency. 


The remaining work portions are arranged 
as follows: Section 2 briefly summarizes the prior 
works. This section outlines the suggested 
intrusion detection system’s technique. In Section 
4, experimental findings are described. Section 5 
provides a final presentation of the findings and 
recommendations for future study. 


Our Contributions 


An effective optimal security solution is 
suggested for an IDS with a novel deep learning 
technique to improve cloud computing. The 
following summarizes the primary contributions 
of our proposed work: 


e A modified generator GAN (MG-GAN) 
approach is proposed in this paper to 
tackle the class imbalance issue. 

e The feature extraction is carried out 
using the SE-ResNeXt-101 approach. 
The NIDS improves overall network 
security by enhancing its capacity to 
identify and react to unauthorized or 
suspect activity by analysing extracted 
features and comparing them with known 
attack signatures. 

e This work presents a feature selection 
method that uses the enhanced binary 
dandelion algorithm to choose features to 
address the problem of feature 
redundancy. 

e The novel improved residual dense 
network is utilized to categorize the IDS, 
in which parameters are tuned using the 
Lyrebird optimization algorithm. 

e Lastly, we evaluate the results of many 
existing IDS approaches and evaluate the 
effectiveness of our innovative 
methodology using three benchmark 
datasets: WSN-DS, BoT-IoT, and 
CICDD082019. 


2. RELATED WORKS 


This part examines the prior research that is 
most pertinent to our study, such as adversarial 
deep learning instances and current IDS intrusion 
attempts. 

To achieve  level-aware black-box 
adversarial assault tactics, Zhou et al. [21] 
introduced a Hierarchical Adversarial Attack 
(HAA) generation approach that targets the Graph 
Neural Network (GNN)-based IDS in IoT systems 
with a constrained budget. By creating a shadow 
GNN framework, an intelligent mechanism 
utilizing a saliency map technique is formulated 
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to produce adversarial examples by efficiently 
identifying and modifying essential feature 
elements with minimal disturbances. A 
hierarchical node selection method was created to 
choose a collection of more attack-prone nodes 
with a high assault priority. 


Liu et al. [22] suggest a NIDS built on 
LightGBM and Adaptive Synthetic (ADASYN) 
oversampling technologies. To prevent the 
influence of the lowest or highest based on the 
total features, they utilize data preprocessing to 
normalize and one-hot encode the original data. 
To address the issue of poor minority attack 
detection rate brought on by unbalanced training 
data, they secondly employ ADASYN 
oversampling technology to boost minority 
samples. Ultimately, the LightGBM ensemble 
learning method is used to minimize the system's 
temporal complexities further while maintaining 
the precision of detection. 


For a more effective intrusion detection 
system, Al et al. [23] suggested using a hybrid 
deep learning (HDL) network made up of Long 
Short-Term Memory (LSTM) and a CNN. 
Furthermore, data imbalance processing was 
employed to lessen the impact of data imbalance 
on the system’s efficiency. This processing 
included the SMOTE and Tomek links sampling 
methods known as STL. 


Jiang et al. [24] provide a hybrid sampling 
and deep hierarchical network intrusion detection 
technique. Using the SMOTE model, they 
initially enhance the minority instances after 
reducing the noise in the majority categories using 
one-side selection (OSS). By establishing a 
balanced data set in this manner, the time needed 
for training the model may be significantly 
decreased and the system can fully understand the 
properties of minority instances. Secondly, they 
create a deep hierarchical network framework by 
extracting temporal characteristics using BiLSTM 
and spatial features using convolution neural 
networks (CNNs). 


Kunang et al. [25] suggest utilizing a 
pretraining strategy with a deep autoencoder 
(PTDAE) in conjunction with a Deep Neural 
Network (DNN) to create a deep IDS. 
Hyperparameter optimization techniques were 
used to make the models. Through an autonomous 
hyperparameter optimization approach _ that 
incorporates grid search and random search 


methods, this study offers an alternative to DL 
framework systems. Finding the ideal category 
hyperparameter configuration and 
hyperparameter values to enhance the detection 
efficiency is made easier with the aid of the 
approach. 

The efficacy of NIDS in protecting network 
environments is contingent upon resolving 
significant deficiencies delineated in 
contemporary scholarship. Research has 
highlighted several critical issues that NIDS must 
address, such as its limited capacity to scale in 
expansive and dynamic network environments 
[21], its inability to identify zero-day attacks with 
conventional signature-based and anomaly-based 
techniques [22], its high false positive rates that 
result in alert fatigue and decreased effectiveness 
[23, 24], and its requirement for more timely and 
adaptive detection mechanisms to keep up with 
evolving threats [25]. The problem statement is 
structured to address these particular gaps in light 
of these findings, highlighting the necessity of 
creating more reliable NIDS solutions to improve 
network security. The following study objectives 
were employed to address the issues mentioned 
above: 


e The goal is to present the best 
preprocessing approach for 
normalization and data encoding from 
the raw data set. Furthermore, the 
proposed model simply requires real- 
time data usage; preprocessing is 
optional. 

e Create a feature selection method that 
effectively prevents overfitting caused 
by the high dimensionality of feature 
space, lowering the IDS model's 
complexity. 

e To develop a novel classifier with a low 
error rate and high accuracy for detecting 
and classifying the type of intrusion or 
assault. 

e To demonstrate the efficacy of our 
proposed model, we will validate it using 
the three standard benchmark datasets 
BOT-IOT, CICDDoS2019 and WSN- 
DS. 


3. PROPOSED METHODOLOGY 


Network traffic is continuously monitored 
and analysed by network intrusion detection 
systems (NIDSs) to spot potentially harmful or 
security-threatening activity. The dynamic nature 
of cyber threats has made it necessary for 
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detection algorithms to undergo constant 
upgrades and enhancements, which could make 
some of the older NIDS research less adaptable 
and possibly obsolete. This research introduced a 
novel detection and classification approach based 
on an improved residual dense network (IRDN). 
Preprocessing, extracting features, selection of 
features, and classification are performed. The 
modified generator GAN (MG-GAN) algorithm 
tackles the class imbalance problem. The SE- 
ResNeXt-101 approach is employed to extract the 
essential features. Then, the features are selected 
based on the improved binary dandelion 
algorithm (IBDA). Finally, the hyperparameters 
present in the IRDN approach are fine-tuned by 
utilizing the lyrebird optimization algorithm 
(LOA). The architecture diagram depicts the 
proposed approach which is shown in figure 1. 
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Figure 1. Overall Framework for the proposed 
methodology. 


3.1. Data Preprocessing 


The preprocessing stage aims to provide a 
smoothed and organized dataset, which will serve 
as the basis for more precise and effective 
intrusion detection in the NIDS’s subsequent 
phases of analysis. 


3.1.1. load datasets 


The datasets that we utilized were openly 
accessible. The information is kept in a pcap- 
formatted CSV file. This stage involved scanning 
the specifics of every dataset using the Pandas 
package and then cleaning each dataset to remove 


any null and redundant information to set it up for 
the following step. 


3.1.2. data encoding 

Label encoding in datasets is carried out at 
this stage. Working with deep learning 
approaches involves interacting with numerical 
values. All dataset labels are not numerical 
values; we encoded the label column by 
converting benign or malicious values to integers 
using the one-hot encoder. 


3.1.3. data normalization 

One preprocessing method for optimizing 
within-range features is to normalize data. The 
learning effectiveness will be impacted by the 
variability of information read from CSV file, 
which contains distinct standard derivations and 
means. We used a standard scalar to scale the 
input data into our model. The datasets were 
normalized using the standard scalar based on the 
“sklearn.preprocessing” library. 


3.1.4. data splitting 


The modelling data sets are split into two 
categories: testing and training. To further 
enhance the algorithm's effectiveness during 
training, we separate the training data into sets for 
training and validation. 


3.1.5. MG-GAN for data imbalance 


Comparing GAN to earlier DL methods, 
the latter is far more sophisticated. In deep 
learning, we attempt to classify, cluster, or 
forecast after training the algorithm with the 
available data. New things are made in GAN. 
There are two primary components of GAN. The 
discriminator attempts to distinguish between the 
original sample and the created sample; the 
generator creates instances without 
comprehending the characteristics of the provided 
dataset. The samples generated and training 
instances are used independently by the 
discriminator. We train the discriminator and the 
generator in a GAN with the feed-forward 
network and dropout method. The two blocks 
cooperate, albeit antagonistically, to better one 
another. With great attention, the discriminator 
attempts to understand the original specimens and 
provides input to the generator regarding the 
synthetic samples made. As a result of feedback, 
the generator attempts to produce new samples 
that closely resemble the original dataset. 


The probability factor of error is defined as 
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Where x represents the number of samples 
produced by the generator, i serves as a loop 
counter, and n is the total number of specimens 
sent as inputs to the discriminator. The proximity 
between two samples is calculated, determining if 
the resulting sample is acceptable. 


The average error is defined as 


(e+4)?-(@-4)? (2) 


n 


The distinction among the created sampling 
and initial sample from a database with n 
instances is calculated using the data pointsp, q. 
Normal computation of the anticipated 
distribution of probabilities involves two 
measures. The Kullback—Leibler divergence 
formula (KL) is a conventional equation. 
Asymmetric KL divergence exists. Jensen— 
Shannon divergence (JSD), which determines 
how similar two probability distributions are, is 
used by GANs 


The JSD is represented as 


JSD(PIIQ) ==D(P|IM) +=D(Q\IM) (3) 


Where M represents the average of P and 
Q. The distribution of probability P’s divergence 
from the average is represented by the symbol. 
We suggest a modified generator GAN, in which, 
like in conventional GAN, the discriminator is 
trained using the initial data. Contrary to the 
fundamentals of a standard GAN, the generator 
receives the original data and multi-variate noise 
with distribution P, as inputs. We use a Gaussian 
covariant distribution, where P, and P; represent 
the likelihood distributions of the synthesized and 
initial information, respectively. This 
distribution's dimension aids in defining the latent 
space. The latent space specifies the intrinsic 
range of variation. This facilitates the generator's 
ability to produce samples inside the allowed 
latent space. The discriminator recognizes the 
synthetic data as accurate because they are within 
the defined area and closely resemble the actual 
data. The generator can be trained without 
repeatedly iterating the feedback loop. This 
reduces the need for computation and its expense. 
These two antagonistic blocks operate using a 
min-max strategy. 


In supervised learning scenarios, 
inequalities in classes are an issue that can be 
strategically addressed with MG-GAN, which is 


lik(@) = [ik fil) (1) 


mainly intended for generating realistic data 
instances. An important tactic is to use MG-GANs 
for data augmentation, particularly for minority 
classes. Unbalanced datasets are greatly enhanced 
by MG-GANs, helping the model better broaden 
to minority classes with sparse real-world 
samples. The generator is trained to produce 
artificial samples of underrepresented classes to 
accomplish this. Additionally, MG-GANs can be 
used in conjunction with minority class 
oversampling to create a class population that is 
more equally distributed. Concentrated MG-GAN 
training ensures that samples produced closely 
resemble specific classes, especially minority 
categories, and concurrently trains discriminators 
to distinguish between produced and _ actual 
samples. 


3.2. Feature Extraction Using SE-ResNeXt-101 


The process of extracting pertinent data 
from raw network traffic and turning them into 
useful features is known as feature extraction in 
network intrusion detection systems. These 
features pick up on these traits and patterns 
connected to malicious and benign network 
activity. Because of its deep architecture, which 
consists of squeeze-and-excitation (SE) blocks of 
data, residual learning with skipped connections, 
and an ensemble method via its "Next" parameter, 
SE-ResNeXt-101 was chosen for feature 
extraction over other CNNs. The model's depth 
allows it to identify intricate hierarchical 
characteristics, and the SE blocks enhance feature 
recalibration by emphasizing the importance of 
relevant data. The inclusion of ensemble learning 
via cardinality further improves its capacity to 
identify a wide range of patterns. SE-ResNeXt- 
101 is well-known for its cutting-edge 
performance and models that have been trained, 
providing a strong basis for transfer learning. In 
light of the stated objectives, the model is a strong 
contender for efficient feature extraction due to its 
widespread acceptance in the research community 
and capacity to satisfy task-specific requirements. 


Utilizing a pretrained SE-ResNeXt-101 
model, characteristics were extracted from the 
input data. A ResNeXt101-32x4d variant with an 
extra squeeze-and-excitation component was 
called SE-ResNeXt-101-32x4d. The computing 
unit that could be created from the process of 
transformation, F;,,, which converts the input data 
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Z € S8'*Y'xR’ to feature mappings V € SB*Y*R, 
was a squeeze-and-excitation block. The kernels 
were represented by U = [u,,Uy,...,Ug] where 
Up stands for the parameter value of the R" filter. 
Ex, was a convolutional operator. Formula (1) can 
thus describe the outcome as V = [11, V2,..., Ve]. 


Va ar (4) 


In this case, U = fri? coca |, = 
[2127 <2sg2* |, and v, € S®*” are represented 
by the symbol *. The 2D spatial kernel, um r, acts 
on the corresponding Z channels and represents a 
single uy” channel. Bias terms were removed to 
simplify the terminology. Channel connections 
were automatically encoded as u, because the 
result was the sum of all the channels; however, 
they became entangled with the local spatial 
relationships that the filters had gathered. One 
potential solution to the problem of channel 
dependency exploitation is to compress global 
geographical data into channel descriptors. Global 
average pooling was used to generate channel- 
specific data to achieve this. In formal terms, the 
statistic a € S® was produced by reducing V over 
dimensions of space B x Y. This allowed for 
determination of the r element using formula (5): 


An additional operation was carried out to 
fully collect channel-wise dependencies using the 
data collected during the squeeze operations. 


An adaptation of ResNet that bore a 
striking resemblance to the inception paradigm 
was the ResNeXt modules. Both follow the merge 
approach, except in this version, the outcome of 
distinct routes was combined rather than depth 
concatenated as in the inception approach. 
Research showed that boosting cardinality 
produced more accuracy than expanding or 
deepening the search. The results of this layer 
were then depth concatenated and fed to a 1 x 1 
convolutional layer. 


e(Z) = Liar Kj(Z) (6) 


Kj(z) in (6) is a function of any kind. 
Projecting z into a (perhaps low-dimensional) 
space, K; modifies and embeds it, resembling a 
primary neuron. The size of the set of 
transformations to be aggregated is indicated by 
the letter R in Formula (3). The word “cardinality” 
is used to characterize R. The level of cardinality 
determines the  number- of _ intricate 
transformations. The combined transform in 
Formula (7) is the residual function. 


The outcome was denoted by f. 


f=Z+) KZ) (7) 


j=1 


The residual learning and attention 
mechanisms underpinning SE-ResNeXt-101's 
operation make it suitable for extracting feature 
tasks such as those found in NIDSs. It uses 
residual connections, which are an extension of 
the ResNet design, to record the distinction 
between input and output, making deep network 
training easier. Squeeze-and-excitation (SE) 
blocks improve discriminative power by 
adaptively scaling channel significance, further 
refining feature maps. The network can now 
capture a variety of patterns thanks to the addition 
of a cardinality parameter called "Next," which 
generates an ensemble of paths within every 
block. The deep structure and high model capacity 
of SE-ResNeXt-101 enable it to learn complex 
hierarchical features, making it an excellent 
choice for extracting features in network intrusion 
detection systems. 


3.3. Feature Selection 


The appropriate characteristics for 
detecting network intrusions are chosen using the 
second percentile approach and recursive feature 
removal approach after the data have been 
preprocessed. The system's computational 
complexity is significantly decreased by selecting 
the best features. The features are selected using 
the improved binary dandelion algorithm. 


3.3.1. binary dandelion algorithm (BDA) 


Using the binary dandelion algorithm 
(BDA) in a discrete search space was suggested. 
The entire search domain in BDA is called “land,” 
and it can be classified as fertile or impoverished. 
Core dandelions (CDs) are dandelions that grow 
on fertile land; assistant dandelions (ADs) are 
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dandelions that grow on poor terrain. The seeds 
that a dandelion sows will be dispersed over the 
surrounding area. The process comprises four 
stages: selection, normal seeding, mutation 
seeding, and initiation. 
step 1: initialization. 

Every seed in the characteristic selection 
puzzles represents the chosen characteristic. We 
encode the seeds given a D-dimensional data 
collection X = (%1,Xz...,Xp), where i= 1, 2, ..., 
D and x; € {0,1}. If x; = 1, the ith feature is 
chosen. If not, there is no selection for the ith 
feature. N dandelions will be randomly created at 
the start of BDA and f indicates the fitness 
function. The seeds are {X,, X2,...,X,}. Assistant 
dandelions are still present. 


step 2: normal sowing. 


Every dandelion’s fitness rating is 
correlated with the quantity of seeds it produces 
in BDA. The following equation estimates 
precisely how many seeds a dandelion will yield: 


In the formula, f max ; and 
i=1,2,.2 max 
f min ; , where max and min denote the 


1=1,2,..N min 

highest and lowest amount of seeds, respectively, 
and ¢ is a small constant that keeps the value of 
the denominator from going to 0. Every dandelion 
may only seed inside its seeding radius, which is 
the specific range surrounding it. A core 
dandelion (CD's) seeding radius differs from an 
AD's. The equation that follows is used to 
determine the CDs' seeding radius: 


Bound t=1 
Ro(ij)= Ro GA) xrt+n a=1 (9) 
Ro i) xXetnm a#l 


In this case, the greatest seeding radius is 
Bound. The wilting and growth factors are 
denoted by r and e, correspondingly, with the 
following value varying: e € [1,1.1]r € [0.9,1]. 
There are two randomized numbers in the range 
[-0.5, 0.5] called 7, andr,. a is computed using an 
additional equation to ascertain when the present 
generation has discovered a higher fitness level 
than the preceding generation: 


— fen(te 
fep(t-1)+e 


(10) 


In the t* generation, the value of the fitness 
of the CD is represented by fc¢p(t). To keep the 
denominator from going to zero, € is a very tiny 
constant. The equation used to determine the 
seeding range of ADs is: 


Rin (if) = {Bound t= 1 
w X Rap(t —1) +173 x (% X (11) 


Xe) — Xavi) t>1 


When the factor of the weight is denoted by 
w, it is determined using the subsequent equation: 


Fe 


w=1- (12) 


Femax 


Fema, is the maximum assessment time, 
while Fe is the present assessment time. The 
seeding radius determines every seed's location. 


— f(X. 
inini e F E Ee M, > min 
Mee Imax =F rain + # (8) 
min M, <min 


We employ the transfer function to translate the 
seeding perimeter into a measure of the location 
of a vector element's reversal likelihood, p(i, j), 
because the procedure is used in discrete space. 


The V-shaped curve F(x) = lerf (4 x)| 
is selected as the BDA function of transfer based 


on research. As a result, the equation that follows 
yields the flip likelihood: 


Pj) 
= { F(Rép(i, j))the seed is core dandelion 
~ LF(Rip (i, the seed is assis tant dandelion (1) 


Therefore, the equation for spreading 
seedlings is: 


x1, ;) 
_ (" —X'@j) randQ < p'(i,j) (2) 
Xj) randQ > p*i,/) 


step 3: mutation seeding. 


We put the seeds through a mutation 
seeding step to boost population variety and 
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enhance the capacity to eliminate the optimal 
locale. Within every generation, a particular 
amount of seeds undergo mutations, which are 
represented by the following Equation (15): 


step 4: strategy selection. 


Only a portion of the seeds and dandelions 
in every generation can make it to the following 
round. Naturally, there is a greater chance that 
seeds with higher fitness values will enter the 
following cycle. We use an additional equation to 
determine every seed’s likelihood of survival: 


— —fi 
a Lae (16) 


fi = If (Xi) — fav (17) 


In the present generation, f,,, denotes the 
overall mean value of fitness of dandelions and 
seeds, while SN indicates the total of all 
dandelions and seeds. 


BDA Utilizing Enhanced Seeding Strategy 
(SBDA) 


In our earlier work, we presented a BDA 
that depends on chaos and seeding methods 
(SBDA) to enhance the efficiency of BDA 
further. The development and wilting factors are 
eliminated in SBDA, and the prior best 
populations are used to reorganize the core 
dandelions. The array's initial seed will be 
modified if it is filled. The following equation 
updates the CDs' seeding radius before the K" 
iteration: 


Scope(t + 1); = Scope(t); +r sinsin(r) (18) 


In the t" formation of core dandelions, 
Scope(t); represents the radius of seeding in the 
i dimension, and r is a random integer. Here is 
the equation used to adjust the CD’s seeding 
radius following the K" iteration: 


Scope(t + 1); = scope(t)i +m X¥(2X1% 
average(i) — 1) 


* (19) 


In the following equation, average(i) 
reflects the mean of all seeds of the i dimensional 
historical optimum individuals while r,, and r, 
represent two randomized numerals: 


vi, Best(i,j) (20) 


average(j) = - 


We also incorporate chaotic individuals 
into SBDA. The chaotic map that we select is 
called a tent, and its equation is outlined below: 


._{ 2xXpGj) 0<pGj<05 
PU+LD= fy eran 05<pij)<1 


3.4. Classification 


To enhance intrusion detection reliability 
and effectiveness, the architecture of deep 
learning is used in the categorization of network 
intrusion detection systems (NIDSs) using an 
improved residual dense network (IRDN). 


improved residual dense network 


One deep learning framework that is well- 
known for its efficiency in various tasks related to 
computer vision is the residual dense network 
(RDN). An RDN is a deep convolutional neural 
network that uses dense and residual connections 
to rebuild data at super-resolution. 


In a traditional CNN, the input of each layer is the 
result of the preceding layer. However, issues like 
gradients exploding and disappearing could arise 
from this. To address this issue, residual 
connections are incorporated into the RDN. 
Residual connections, which add the inputs to 
each residual block's final result, allow the 
network within each residual block to learn the 
difference between its input and its outcome. This 
improves training equilibrium and permits a 
deeper network depth. 


On the other hand, dense connections 
combine the inputs and outputs of each block, 
allowing the network to train on all of the data 
from previous levels. This fixes the gradient 
vanished problem and enhances the network's 
ability to extract characteristics. As a result, the 
RDN may broaden the network and increase 
efficiency. The residual and dense links 
framework, which includes an integrated function 
with batch normalization (BN), ReLU, Conv, and 
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other operations, make up the fundamental units 
of the RDN, the residual dense block (RDB). 

To increase its capabilities, we can add 
more layers, such as spatial and channel attention. 
The layers above will assist the network to focus 
more intently on important spatial locations and 
channel-wise connections within the data. 


The spatial attention layer is implemented 
after each residual dense block. The average 
information across geographical dimensions is 
first obtained by pooling global average 
operations. A dense layer triggered by a sigmoid 
is included after the vector has been resized to a 1 
x 1 spatial dimension. This operation, which 
assigns weights for attention to different spatial 
regions, enables the network to highlight salient 
features. 

Similarly, the channel attention layer 
comes after every residual dense block. It includes 
two global pooling procedures across channel 
dimensions: mean and maximum pooling. A 
sigmoid activates a dense layer that receives 
concatenated outcomes. This process records 
inter-channel correlations and enables the 
network to weigh channel-specific attributes 
adaptively. These attention layers come after 
every residual dense block in the RDN design, 
smoothly integrating with it. Once the attention 
operations are finished, a 1 x 1 layer of 
convolution is added to enable global learning 
residuals. This layer guarantees information flow 
from the source to the output and improves the 
model's capacity to capture global characteristics. 

The characteristics are then combined 
using a global average pooling operation, and the 
categorization result is generated by a dense layer 
using a softmax activation function. The dense 
layer’s class count should correspond to the task’s 
categorization specifications. 

The input is initially convolved in the RDB, 
and the tensor is batch-normalized following the 
convolution in the RDB. In this case, T1 is the 
residual combined tensor by T2 and the input 
layers, and T2 is activated by the LeakyReLU 
function. 


T = N(C(D)) (22) 


The normalization action in Formula (22) is 
represented by operator N, operator C's 
convolution process, and I's input layer. The 
normalized tensor in the RDB is denoted as T. 


During the entire model life cycle, the initial RDB 
block. 

Tl as the output tensor. This block is 
utilized to tensor to the subsequent RDB block 
transitively; this approach is helpful in SR. 
However, it has a_ significant weight in 
categorizing, which impacts the categorization's 
precision and effectiveness. To resolve this issue, 
we eventually use the tensor T2, which does not 
have residual concatenation in the RDB, for 
residual concatenation. 


T1 = Concate (T,I) (23) 


T2 =L(T) (24) 


Concate stands for the residual concatenate 
operation in Equation (23). T2 is the tensor that 
comes after LeakyReLU with an alpha of 0.3. T1 
is the tensor grouped among T and I, while L 
indicates the LeakyReLU operation. 


The primary goal of the attention’s spatial 
layer is to draw attention to significant spatial 
places in the characteristic maps. The following is 
the equation for the spatial attention mechanism: 


The spatial attention S can be computed as 
follows given an input feature map X with 
dimensions H*W*C. At the same time, H 
indicates the height, the width is represented as 
W, and the amount of channels is indicated as C. 

The Global Average Pooling (GAP) is 
represented as 


1 oa 
G = GAP(X) = —— HiesWjX(i,j,:) (25) 


The activation function of the sigmoid in 
the dense layer is employed as 


Sspatial = o (Dense (G,esnape)) (26) 


The element multiplication of the 
combined layer is represented as 


Xattentea = X ° Sspatiat (27) 


In this case, element-wise arithmetic is 
indicated by o, which stands for the activation 
function of the sigmoid. The channel attention 
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layer captures the map of the feature inter-channel 
connections. The channel mechanism for 
attention can be expressed using the formula that 
follows: 

The activation function of the sigmoid in the 
dense layer is employed as 


Schannet = o (Dense (Geoncat)) (28) 


The element multiplication of the combined layer 
is represented as 


Xattentea = X * Schannel (29) 


The RDN design applies these layers of 
attention after every residual dense block, which 
helps the framework adaptively focus on pertinent 
spatial positions and  channel-wise data 
throughout the NIDS categorization process. 
Subsequently, the residual connection operation 
produces a new tensor (T3), created by the block 
outcomes of the 3-layer RDB and the original 
input tensor. 


T3 = R3(C(C())) (30) 


Operator R? in Formula (30) denotes the 3- 
layer RDB operation. The input layer might be 
reloaded for the remaining connections to 
improve classification accuracy following three 
pooling processes. Equation (31) then shows that 
tensor T3 is used to perform the residual linked 
operation and the resultant tensor T4 prepares the 
data for categorization. 


T4 = Concate(T3, P?(C(1))) (31) 


P? represents the third pooling operation. 
We incorporate a thick layer for categorization. 
To avoid overfitting, the optimizer uses the 
Lyrebird optimization algorithm (LOA), the loss 
function is cross-entropy, and an L2 regularizer is 
included in the dense layer. 

To handle the multiple categorization 
problem more efficiently, the output layer used 
the function of cross-entropy, and the loss layer 
utilized the softmax activation function. 


loss(x, class) = —x[class| + (3) 
log (Xi=5 exp(aL 


Integrating the ideas of residual and dense 
connections defines the way an improved residual 
dense network (IRDN) for an NIDS operates. The 
vanishing gradient issue in deep networks is 
resolved by residual connections, which enable 
the network to learn residual functions by 
utilizing skip connections. Dense connections link 
every layer to every layer before it, making 
feature reuse and information flow easier. These 
coupled connections allow the network to 
instinctively acquire and utilize pertinent features 
from network traffic data in a residual dense 
network for NIDS categorization, improving the 
network's capacity to distinguish between 
malicious and legitimate activity. 


3.5. Parameter Optimization using Lyrebird 
Optimization Algorithm 

The rationale for the proposed lyrebird 
optimization algorithm (LOA) and its 
mathematical structure for application in 
optimization scenarios are presented in the 
ensuing section. By mimicking lyrebird behaviour 
in the wild, the LOA optimizes the parameters. 
Organisms imitate the characteristics of the fittest 
options during a mimicry phase, which involves a 
population with parameter arrangements. Then, 
the method presents random disturbances to 
maintain diversity while examining the parameter 
space. Finally, the fitness assessment guides the 
adaptation process to select superior options for 
the subsequent iteration. The LOA effectively 
explores and adjusts to the NIDS parameter space 
by utilizing mimicry and adaptation to identify 
ideal configurations for improved intrusion 
detection effectiveness. 


3.5.1. inspiration of LOA 


The superb lyrebird and Albert's lyrebirds 
are the two species native to Australia. These 
magnificent birds belong to the family 
Menuridae. They can remarkably mimic both 
manufactured and natural noises from their 
surroundings. Some of the most recognizable 
native birds of Australia have a distinctive plume 
of neutral-coloured tail feathers. Male superb 
lyrebirds measure 80-98 cm long, while females 
are 74-84 cm. In contrast, the maximum size of a 
female Albert's lyrebird is 84 cm, while the 
maximal size of a male is 90 cm. Although they 
are similar in some ways, Albert's lyrebird has 
less beautiful lyrate feathers than the superb 
lyrebird. Superb lyrebirds weigh around 0.97 kg, 
while Albert’s lyrebirds weigh around 0.93 kg. 
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When the lyrebird detects possible danger, 
it carefully surveys its surroundings and then 
either flees or hides in a suitable location. The 
suggested LOA approach discussed below was 
created using computational simulations of this 
lyrebird technique in times of peril. 


3.5.2. algorithm initialization 


The suggested LOA method is based on an 
individual metaheuristic algorithm in which the 
population comprises lyrebirds. By leveraging the 
collective search capability of its members, the 
LOA can offer appropriate alternatives for 
problems with optimization in an iteration-based 
procedure. As a member of the LOA, every 
lyrebird chooses the value of the decision 
variables according to where they are in the 
problem-solving space. In the LOA, the 
algorithm's individuals may be _ represented 
mathematically as a matrix by Formula (33). 
Equation (34) randomly initializes LOA 
members' positions in the problem-solving space. 


XxX, Mya. Xta  X14m 
x=(X, =|%jq “" Ma - Mn (33) 
Xw Nxm Xy1a " Xn,d Xn,mInyem 
Xid = lbg +r: 
(ubg — lbg) 
F, F(X) 
F=|F; =| F(X;) (35) 


Fy Nx1 F(X) Nx1 


X represents the LOA matrix of the 
individuals, Xi denotes a possible outcome, x; is 
its d™ dimensions, N indicates the number of 
lyrebirds, m represents the number of decision 
variables, r indicates the amount chosen at 
random within the interval [0,1], [band wb stands 
for the lower and upper bounds. The function of 
the existing objective can be assessed by 
considering that every member of the LOA 
represents a potential solution to the issue. Thus, 
the values for the function of the objective are 
available and correspond to the number of 
individuals. 


Here, the assessed function objective is 
denoted byF;, and the estimated objective 
function vector is represented by F. An 
appropriate criterion for determining the caliber 
of the potential solutions is estimated. 


(34) 


Additionally, as the lyrebirds' positions within the 
problem-solving space change with every 
iteration, the optimal candidate solution must also 
be updated based on assessing the function's 
objective value. 


3.5.3. mathematical modelling of LOA 


The = suggested LOA §approach’s 
architecture updates individual members’ 
positions during every phase based on an 
algebraic representation of the lyrebird strategy’s 
reaction to threats. The two phases of the 
population upgrade procedure are (i) hiding and 
(i1) escape, based on the lyrebird’s decision in this 
scenario. Equation (4) simulates the lyrebird’s 
decision-making procedure in the LOA design 
when selecting between hiding and escaping from 
danger. Hence, only one of the two initial phases 
updates every LOA member’s location during an 
iteration. 


update process for 
1 nic on phase 1, %S 0.5 36 
“lt based on phase 2, else (36) 
Where 1, indicates the randomized number 
from the range [0,1]. 


Phase 1: Escaping Strategy (Exploration 
Phase) 


Using an illustration of the bird's flight 
from the dangerous position to the safe areas, an 
individual member's location can be modified in 
the search area throughout this stage of the LOA. 
The lyrebird's capacity to explore new locations 
in the problem-solving space and make significant 
positional changes after moving to a safe place 
indicates the LOA's global search exploration 
capability. The positions of other population 
members with higher objective function values 
are considered secure areas for every participant 
in the LOA design. 


S A, = {Xn Fe < Fpand k 
€ {1,2,..,N}},wherei (37) 
= 12)5N 


The collection of secure regions for the i® 
lyrebird is SA; in this instance and the row of the 
X matrix with a higher function’s objective value 
than the i LOA member is represented by Xx. 
The lyrebird is thought to randomly flee towards 
one of these safe havens in the LOA architecture. 
Formula (35) determines an alternate position for 
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every LOA member depending on the lyrebird 
movement modelling completed in this step. The 
new spot then substitutes the previous place of the 
relevant member by formula (38) if the outcome 
of the objective function is enhanced. 


xpP = Xij + Nj . (SSA; ; = Ij : Xi) (38) 
Xi = {xP*, FP < F; Xi else (39) 


The chosen secure area for the i lyrebird is 
denoted bySSA; ;, its |" dimension; the new spot 
for the ith lyrebird, x77, depends on the escaping 
method of the suggested LOA; Fj? is the 
function’s objective value; randomized numbers 
from the range [0,1] are represented by 7; ;; and 
data that are selected at random as 1 or 2 are 


represented by J; ;. 
Phase 2: Exploitation Phase 


During this stage of the LOA, a population 
member’s spot is modified in the search space 
according to the lyrebird’s modelled approach of 
hiding in its immediate safe area. Little variations 
in the lyrebird’s position demonstrate that the 
LOA can be exploited in local search when it is 
used to accurately survey the surroundings and 
move in modest steps to find a suitable hiding 
place. 

In the LOA design, a new spot is 
determined for every LOA member utilizing 
formula (40) depending on the simulation of the 
lyrebird's migration towards the nearby 
appropriate region for concealment. If the new 
spot enhances the outcome of the intended 


xp = xij + Nj : (SSA; ; —_ Ij . xij) (40) 


Y= ce pa oe (41) 
X,, else 


function as per Formula (41), it eliminates 
the old spot of the appropriate member. 


Ds we , é Dix 
Where X?" is its j" dimension, F/” is the 
desired function value. 


3.5.4. computational complexity 


This section assesses the computation 
required of the suggested LOA technique, 
considering both time and space complexity. The 
initialization procedure, determining the goal 
function, and population update have the 


following effects on the time complexity of the 
LOA: 


e The temporal complexity of the LOA’s 
setup and activation phases is O(Nm), 
while the number of decision variables is 
denoted by m in the problem and N is the 
total number of lyrebirds. 


e = The objective function for every lyrebird 
is determined in every iteration. 
Consequently, the time complexity of 
computing the objective function is 
O(NT), while T is the highest number of 
LOA iterations. Every lyrebird is 
upgraded at random depending on 
whether it is in hiding or escaping phase 
throughout every repetition. As a result, 
the time complexity of the lyrebird 
upgrade process is O(NmT). Figure 2 
displays an illustration of the LOA. 


Optimization issue input information. variables 
interval, objective function and constraints 


Population size is set as (N) and the no. of 
iteration as (T) 


Create and estimate the initial population 


Update new Calculate new 
position position 


Figure 2. Flowchart. 


4. RESULT AND DISCUSSIONS 


In this part, the findings of the experiment 
and outcomes are discussed. This study validates 
our proposed method on three datasets and 
estimates its effectiveness. A pair of data 
instances are created, utilized as the training 
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dataset, and employed to construct the classifier. 
Using the testing dataset, a classifier is evaluated 
in the second stage. Two experiments were 
conducted to analyze the efficiency of the model. 
In the first investigation, more than one 
categorization is used, and in the second, it is 
contrasted with existing techniques. To evaluate 
other datasets for cutting-edge machine learning 
and deep learning techniques, we concurrently 
designed contrasted investigations on them. 


4.1. Experimental Setup 


The outcomes on the Python platform were 
achieved using a PC equipped with an i5 
processor and 8 GB of RAM. The efficacy of deep 


Specificity = es 


FP+TN 


(44) 
learning models’ classification can be assessed 


using various techniques. The test setup is shown 
in Table 1. 


Table 1. Setup. 


Project Environment 
RAM 16 GB 

System Python 
Processor Intel 15 2.60 GHz 
Anaconda 4.5.11 

Python 3.9 

Backdrop TensorFlow 


4.2. Dataset Descriptions 


We examine the three publicly accessible 
intrusion detection datasets, the BoT-IoT, WSN- 
DS, and CICDDoS2019 datasets, which have 
been extensively used in previous investigations. 

BOT-IOT dataset: The 11 common 
upgraded attacks, DDoS, stealing, 
reconnaissance, and denial of service are all 
covered by this new NIDS dataset. Over five days, 
Bot-IoT2019 generated a significant volume of 
traffic packets and attack kinds. The dataset 
included 3,119,345 occurrences and 15 
characteristics, comprising five class labels (four 
assault and one regular label). 

CICDDoS2019 dataset: The present 
investigation uses the CICDDoS2019 dataset, 
which has become widely used for detecting 


FAR = —— (45) 


TN+FP 


DDoS attacks and classification. The collection 
includes many recent, actual DDoS attack 
samples and benign instances. 

WSN-DS: To distinguish between 
legitimate and malicious communication, WSN- 
DS was created in 2016 and uses sensors to track 
the number of nodes in wireless networks. The 
LEACH routing protocol is used to retrieve the 
records from this dataset, which are represented 
by 23 characteristics. Four other types of DoS 

attacks exist: floods, grayhole, blackhole, 
and TDMA, in addition to regular records. 


4.3. Evaluation Metrics 


To evaluate the efficacy of our approach, 
we calculate the false alarm rate, specificity, 
accuracy, and recall. Additionally, we evaluated 
the accuracy and F1-score of the framework. The 
justifications and references for these 
measurements are as follows: 


4.3.1. Accuracy 


The system’s capacity to accurately 
determine whether a given behaviour is an attack 
or regular operation is known as accuracy. 
Equation (42) can be utilized to compute the 
accuracy value. 


TP+TN 


Accuracy = ————_——— 
y TN+FN+TP+FP 


(42) 


4.3.2. Sensitivity 


Sensitivity shows how well the system 
distinguishes between all identified threats and 
the incoming activity representing a real attack. 
Equation (43) can be used to obtain the sensitivity 
value. 


Sensitivity = —— (43) 


4.3.3. Specificity 


Specificity, as opposed to sensitivity, 
indicates the system’s capacity to distinguish 
between all observed normal data and the one 
incoming activity that is normal. Equation (44) is 
used to compute it.. 


4.3.4. False Alarm Rate 


The number of attacks mistakenly 
estimated as typical activity is known as the false 
alarm rate. The more assault action anticipated to 
be typical, the higher the false alarm rate. 
Equation (45) is applied to achieve this. 
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# Experiment 1 (Assessment on BoT-IoT 
Dataset) 


When assessing the effectiveness of NIDSs 
in IoT environments, the BoT-IoT dataset is a 
valuable tool. The complexity and difficulties 
involved in protecting IoT networks are captured 
in this dataset, which includes a broad range of 
genuine cases. Evaluating NIDSs’ performance 
on datasets like BoT-IoT is crucial for 
determining how resilient and adaptable they are 
to new threats as academics and practitioners 
work to improve their capabilities. The class-wise 
evaluation of the dataset 1 is shown in Table 2. 


Table 2. Training and testing class-wise evaluation 
using the BoT-IoT dataset. 


Training Data Testing Data 

Attacks |ACC Real? resis FPR Recall? oar 
(%) | (%) | (ogy | (%) (%) | (og) 

Dos __|99.29} 98.97} 99.03] 0.05 98.96| 99 


Reconnais 
sance 


99.12/99.06| 99.11] 0.08 99.06 | 99.09 


The BoT-IoT dataset contains some attack 
types. The multiple categorization of the dataset 1 
is illustrated in Figure 3. In the models suggested, 
with every attack class, the efficacy of numerous 


classification is higher and _ yields better 
outcomes. Every class attains a minimum 
accuracy rate of 99%. These are the ideal 
guidelines. 


BoT-IoT dataset 


Percentage(%) 


Reconnaissance 
Attacks 


(a) 


0.10 
A- FPR(Train) 
-@ FPR(Test) 
0.08 
= 
8 0.06 
< 
g 
E 0.04 
go 
C 
a 


0.02 


0.00 


Reconnaissance 
Approaches 


DDos 


(b) 


Figure 3. Multi-categorization outcome of BoT-IloT 
dataset. (a) Estimation of accuracy, precision, and 
recall. (b) Comparison of FPR. 


Table 3 and Figure 4 compare existing 
approaches with the proposed approach using the 
BoT-IoT dataset. The existing approaches like 
RNN, DeepDCA, MLP, and CNN are employed 
to contrast with our proposed approach. In 
contrast with other methods, the presented 
approach yields a better result, which is 99.93% 
accuracy, 99.21% precision, 98.74% recall, and 
99.47% of F1-score. 


Table 3. Evaluation based on the dataset 1. 


Refere Aporoach Accuracy Precision Recall F1-Score 
nce “PP (%) (%) —(%) —_—(%) 
[26] RNN 99.91 - - - 
[27] DeepDCA_ 98.73 99.17 98.36 98.77 
[28] MLP 70.55% 43.82% 74.39% 55.15% 
[29] CNN 99.02 99.09 99.07 99.03 
Our — Proposed 9993 99.21 -—«98.74.:99.47 
work — method 
110 8aRS 
100 RES 
90 
= 80 
70 
2 
2 © 
$ 50 
a 4 
30 GB Accuracy 
a Precision 
7 GB Recall 
10 GB FI-Score 
° RNN DeepDCA MLP CNN Proposed 
Methods 
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Figure 4. Differentiation of existing approach with 


1.0 
proposed utilizing BoT-IoT dataset. 


~@- FPR(Train) 
08 -@- FPR(Test) 
0.8 4 
= rss. 07 
#Experiment 2 (Assessment on CICDD0S2019 = ! | 
Dataset) 8 sai f \ 
1 \ 
Multiple experiments are conducted on = ! ‘ 
dataset 2 to determine the proposed method's F ] \ 


effectiveness. Table 4 displays the results of the 
multi-class identification process. 


a 
td 


90,97 A097 W855 0.02 0.01 0 
Table 4. Training and testing class-wise evaluation 
using the CICDDoS2019 dataset. 


se 
= - Approaches 
Training Data Testing Data 
Recall} Precisi ACC|Recal] Precisi| FPR 
1(%)| on (%)| (%) on (%)| (%) (b) 
99.09] 98.88 0.11 |99.05}98.86] 99 0.12 
99.12}99.05 0.09 |99.11]99.05| 98.80 | 0.09 
198.97199.04 0.05 99 0.05 Figure 5. Multi-categorization outcome of 
NetBIOS {99.01 98.94] 99.03 | 0.8 CICDDoS2019 dataset. (a) Estimation of 
SYN 99 99.07] 98.95 | 0.7 accuracy, precision, and recall. (6) Comparison of 
MSSQL 98.87 99.01 | 0.08 FPR. 
UDP {99.21 99.13} 99.04 | 0.06 
99.23 99.23}99.07| 99.18 | 0.02 
an mE or ae Differentiation of the existing approach 
with the proposed dataset 2 is shown in Figure 6 
The multi-categorization outcome of the and Table 5. The proposed approach yields 
CICDDo082019 dataset is shown in Figure 5. The greater performance in the CICDDoS2019 
presented approach obtains higher performances dataset. 
in every assault category. Compared with other _ 
assaults with more than 99% accuracy, NTP and Table 5. Comparison to similar approaches using the 
MSSQL show less accuracy. CICDD0S2019 dataset. 
Approaches ial a il Recall (%) 
CICDD0S2019 dataset DRCNN [30] 98.89 99.12 99.32 99.06 
MLP [31] 84.4 92.5 89 94.2 
sh (AE)+MLP [32]97.91 98.34 98.18 98.48 
Bi-LSTM [33] 97.93 98.18 - 99.84 
90 CNN [33] 93.3 95.4 92.8 92.4 
Proposed 99.11 99.23 99.09 99.09 


Percentage(%) 
2 
s 


BB Accuracy(Train) 


Percentage (%) 
cosueseas4uee322 


d Wo oo a dh of as 6 
Wg Me™ Mh gt 
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Figure 6. Differentiation of existing approach with 
proposed utilizing CICDDoS2019 dataset. 
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# Experiment 3 (Assessment on WSN-DS 
Dataset) 


In this phase, we use the WSN-DS dataset an > FPR(Train) 
to assess the efficacy of the suggested method. te -O- FPR(Test) 
The class-wise performance of the WSN-DS 0.08 “, 
dataset is shown in Table 6. = Vi 

8 0.06 Soe 
Table 6. Training and testing class-wise evaluation 3 ee 3 
using the WSN-DS dataset. 5 iti ‘ i 
Training Data Testing Data a ” B—. ——..¢! 
Attacks |A4CC\|Recal|PrecisioFPR| ACC |Recall| Precisi| FPR PT 0 
(%) |1 (%)| 0 (%) 
Blackhole |99.18)99.14| 99.06 |0.09/99.17] 99.12 
Grayhole |99.16|99.05| 99  [0.05|99.16] 99.05 0.06 O Ptackhole Gea hole scfiediling Floodin 
Scheduling 99.07 0.03]99.07 0.03 Approaches 
Flooding |99.31/99.06] 98.97 |0.03}99.27] 99.06 | 98.96 | 0.02 
(b) 


Multiple categorizations of assault are 


shown in Figure 7. While differentiating from Figure 7. Multi-categorization outcome of WSN-DS 


existing approaches, the presented approach has dataset. (a) Estimation of accuracy, precision, and 
superior performances over all assaults. recall. (b) Comparison of FPR. 


Table 7 and Figure 8 differentiate 
existing approaches employing the WSN-DS 
dataset. Contrasted with prior methods, the 
presented approach yields a greater performance 
in Fl-score, precision, recall, and accuracy. 


WSN-DS dataset 


% Table 7. Evaluation based on the WSN-DS dataset. 
i] 
8 
t Accuracy | Detection Rate 
8 Reference Approach (%) (%) 
a BB Accry( in LR 97 71.7 
[34] NB 83.1 76.5 
DT 99.1 95.1 
[35] CNN-LSTM 99.58 97.77 
Black hole Gray hole scheduling Flooding Our work | Proposed method 99.64 98.87 
Attacks 
110 
(a) 


= 
Ss 
9 


s 


Performance (%) 
o 
3 


70 
60 -tB- Accuracy 
-+ Detection Rate 
50 
LR NB DT CNN-LSTM —— Proposed 
Approaches 
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Figure 8. Differentiation of existing approach 
utilizing WSN-DS dataset. 
#Experiment 4 (Hyperparameter 
Configuration) 
£ 100 
o 
The F1-score was used to determine the suggested | eat 
. . . * o 0 90 
technique's hyperparameters, including learning d saa “és 
and dropout rates. Various hyperparameter tuning S 0.02, 
strategies can be used to determine the best ane a 


settings for hyperparameters. Nonetheless, grid 
search is sufficient to identify the ideal 
hyperparameter values because the suggested 
NIDS only considers two parameters. Finding the 
ideal values is made simple by the upward convex 


shape of the grid search results, as seen in Figure 
9. 


100 
2095 
e 
= 0.90 
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0.85 
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w 0.80 od 
, J 
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0.06 ¥ 
0.70 é 
~ 
0.000 9902 0.004 goo6 0.008 ail 
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Figure 9. Hyperparameter configuration of 
proposed datasets 


Regarding the ideal value set depicted in 
the images, the dropout was set for our suggested 
dataset at 0.0032 and 0.001. For every dataset, the 
learning rate was set to 0.01. 


# Experiment 5 (Assessment of Training and 
Testing) 


Figure 10 depicts the categorization loss 
and accuracy percentage of the IDS plotted with 
the number of iterations. As the graphic 
illustrates, this paper's approach produces a 
practical convergent effect. Within the entire 
dataset, there were distinct phases for training and 
testing. In this inquiry, 20% of the data is used for 
testing, and 80% is used for training. 


2.0 


—— Training (Smoothed) 
Training 
-* Testing 


Ls 


Iteration 


Figure 10. Evaluation of training and testing 


Table 8 represents the differentiation of the 
presented approach from prior research. The 
performance of the proposed approach is better 
than that of other approaches. 
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Table 8. Differentiation of related works. 


References |Methods| Pros Cons Accuracy 
Reduces the time 
for training of 
Zhou et al. classifiers and |Overfitting issue 
GNN : = : 
[21] increases occurs 
categorization 
precision 
: Its functionality 
higetak pay] | , Pananees ge dependent on | 99015 
M detection rate : 
input factors 
LSTM- eae ae 
Alet al. [23] CNN categorization | computational | 99.17% 
accuracy cost 
P F Enhanced 
Hane cd La categorization |Overfitting issue] 83.58% 
[24] -CNN : 
efficiency 
unane eral Resolution of sia te ve 
gra"! DNN | high-dimensional], ® 83.33% 
[25] : dimensional sets} 
issues 
of data 
It has a 
Ourwork Proposed] Enhanced overall | significant 99.93% 
approach| performances number of 
parameters 


Table 9 shows the differentiation of the 
proposed feature selection over existing ones. The 
proposed approach was better at feature selection 
than other approaches. 


Table 9. Differentiation of feature selection 
algorithms. 


Algorithm Accuracy TPR FPR 
GA 94 98.1 1.28 
SSO 92.4 98.5 1.74 

Cuttlefish 91.9 98 1.79 

Sigmoid- 94.7 97.4 0.97 
PIO 

Cosine- 96 98.2 0.76 
PIO 

Proposed 99.1 99 0.24 


class imbalance 


existing 
approaches, like DSSTE, AESMOTE, ADASYN, 
WGAN, and SMOTE, are employed to contrast 
with the presented model. The presented model 
yielded better results than the other approaches, as 
shown in Table 10. 


Various 


Table 10. Differentiation of class imbalance 


approaches. 
Approach Accuracy F1-Score 
DSSTE 82.84 81.66 
AESMOTE 82.09 82.43 
ADASYN 78.97 - 
WGAN 80.80 - 
SMOTE 79.10 75.76 
Proposed 99.00 89.74 


The deep-learning-based existing 
approaches are contrasted with those presented, as 
shown in Table 11. The proposed method yields 
superior performances. 


Table 11. Differentiation of prior approaches. 


Methods oo ore ee Pi y re en. Accuracy (%) 
IHCRNNIDS| 93.47 97.98 93.47 94.58 
RNN-ABC 95.12 97.29 92.57 96.89 
CNN 97.58 92.7 93.5 93.8 
LSTM 91.09 93.3 92.55 94.73 
Conv- 
LSTM 98.14 99.67 95.78 97.03 
Proposed 99.17 99.11 99.21 99.93 


Statistical Significance Test 


We employ the widely used Wilcoxon 
statistical significance test to demonstrate the 
statistical significance of the efficiency 
enhancement achieved by the proposed approach. 
The MG-GAN strategy and the original data are 
used to assess the test. Additionally, we conduct a 
test of statistical significance to demonstrate how 
our suggested approach significantly enhances the 
findings of previous studies in this area. A 
nonparametric statistical test called the Wilcoxon 
signed-ranks test is used to rank the differences 
between the outcomes of two methods when 
choosing features using the dataset. It evaluates 
the positions for positive and negative differences 
while ignoring the signs. Let di represent the 
difference in the feature selection approaches’ 
effectiveness ratings on the ith classification 
model. The differences are then sorted based on 
their absolute values. In the event of a tie, average 
ranks are determined. Considering R* as the total 
ranks in which the subsequent method performed 
better than the first and R™ as the total ranks for 
the reverse situation, 


Rt = >. rank(d;) >) rank(d;) (46) 


dj>0 daj;=0 


», 


dj<0 


1 
R- =  rank(d;) + >), rank(d;) (47) 
dj=0 


Ranks of d; =Oare distributed equally 
among the totals. One of them is disregarded if 
there are an odd number of them. Let the 
Wilcoxon test tT be considered as average, then 
t= (R*,R-) for the level of confidence a = 
0.05. 


3504 


Journal of Theoretical and Applied Information Technology 
30" April 2024. Vol.102. No 8 


© Little Lion Scientific 


ISSN: 1992-8645 


SATIT 


E-ISSN: 1817-3195 


4.4, Discussion 


The main goal of the project is to develop 
an efficient system for intrusion detection that can 
discriminate between legitimate and malicious 
traffic. The complexity of cybersecurity issues 
has increased due to the daily discovery of new 
assaults, and standard detection systems for 
intrusions have a high rate of false alarms that lead 
security analysts to overlook malicious attempts 
and leave the system open to attacks of any kind. 
The data utilized for training intrusion systems are 
deemed outdated and comprise redundant 
information, leading to inadequate training and an 
inefficient process for training and evaluating 
systems. Recently, experts have started working 
on deep-learning-based systems for intrusion 
detection. According to a recent study, deep 
learning performs better than traditional learning 
techniques when it comes to classifying received 
traffic in enormous datasets and continuously 
attacked environments and identifying fraudulent 
traffic. 

To address the dead neuron problem in the 
residual block, the activation ReLU function is 
applied following the convolution process in each 
layer, and functions in the LeakyReLU are used 
in the tensor following normalization. The RDN 
design, which combines the flexibility of attention 
mechanisms with the strength of residual dense 
blocks, is enhanced by incorporating these 
attention layers. The information flow in the RDN 
is facilitated by the residual connections, and 
attention layers offer a way to modify the 
significance of features dynamically. Utilizing the 
LOA in the optimization layer, the learning rate 
was adaptively adjusted to 0.0001 at the 
beginning and the loss function was minimized. 
While differentiating from other approaches, the 
presented approach yields greater performances. 


4.5. Limitations 


Even though the model suggested in this 
paper increases detection accuracy, it still has 
certain shortcomings: first, it has a significant 
number of parameters; second, it improves the 
accuracy of detection for a limited number of 
specimens, but the impact of the enhancement is 
minimal. To increase minority sample 
identification accuracy, enhance the model's 
overall categorization effect, and lower the 
running time cost, we will investigate light 
weighting of the model in more detail in the 
future. 


5. Conclusions and Future Scope 


Generic multiple categorization impacts 
and insufficient feature extraction are two 
common problems with traditional intrusion 
detection algorithms. This work presents a novel 
approach to attack detection that efficiently 
classifies and detects attacks to address these 
issues. The framework uses the improved binary 
dandelion algorithm (IBDA) and the modified 
generator GAN (MG-GAN) algorithm to tackle 
the problems of dataset disparities and redundant 
feature extraction. Next, the novel improved 
residual dense network is used to categorize the 
attacks. The lyrebird optimization algorithm 
(LOA) is employed to modify the 
hyperparameters. According to the BoT-IoT, 
CICDDo0S2019, and WSN-DS datasets, the 
suggested model's accuracy is 99.93%, 99.23%, 
and 99.64%, respectively. 

As a result, superior performance in 
detection was attained. The prior NIDS 
demonstrated DR of up to 93.49% and 98.31% 
based on the Fl-score; in contrast, the proposed 
NIDS attained the greatest detection performance 
of 98.87% and 99.64% in the effectiveness 
analysis. Furthermore, compared to existing 
approaches, the study shows that the framework 
can effectively extract characteristics with a low 
FPR and high accuracy in detection from multi- 
dimensional, massive network data. This was 
demonstrated by a number of studies, including 
feature selection analysis, ensemble methods 
versus single-model evaluations, testing and 
training time distinction, and the efficacy for the 
assessment of the dataset. 

Additionally, the structure significantly 
enhanced the impact of detection for a restricted 
class set, offering promising _ real-time 
applications for IDSs. This study can be extended 
in the future by creating an IDS framework with 
cutting-edge explainable artificial intelligence 
(EAI) models. By categorizing mobile traffic, 
attacks can be found using this technique. Multi- 
modal deep learning will be employed to enhance 
the overall efficacy of IDSs. 
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